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Abstract — Automatic analysis of biomedical time series such 
as electroencephalogram (EEG) and electrocardiographic (ECG) 
signals has attracted great interest in the community of biomed- 
ical engineering due to its important applications in medicine. 
In this work, a simple yet effective bag-of-words representation 
that is able to capture both local and global structure similarity 
information is proposed for biomedical time series representation. 
In particular, similar to the bag-of-words model used in text 
document domain, the proposed method treats a time series 
as a text document and extracts local segments from the time 
series as words. The biomedical time series is then represented 
as a histogram of codewords, each entry of which is the 
count of a codeword appeared in the time series. Although the 
temporal order of the local segments is ignored, the bag-of-words 
representation is able to capture high-level structural information 
because both local and global structural information are well 
utilized. The performance of the bag-of-words model is validated 
on three datasets extracted from real EEG and ECG signals. The 
experimental results demonstrate that the proposed method is not 
only insensitive to parameters of the bag-of-words model such as 
local segment length and codebook size, but also robust to noise. 

Index Terms — bag of words, codebook construction, clustering, 
time series classification. 



I. Introduction 

WITH the development of modern technology and reduc- 
tion of hardware cost, a large amount of biomedical 
signals such as electroencephalogram (EEG) and electrocar- 
diographic (ECG) are collected every day. These biomedi- 
cal signals are very useful for monitoring human's physical 
condition. It is however a challenging task to efficiently and 
effectively analyze these signals. Traditionally, these signals 
are manually analyzed by professional experts. However, there 
are several disadvantages of the manual analysis. Firstly, 
comparing to the large amount of biomedical signals, the 
number of professional experts, especially the ones with 
extensive experience is very limited. Secondly, inspection and 
monitoring of long-term biomedical signals such as EEG and 
ECG signals are always very time consuming. It is difficult to 
keep a high level of concentration during a lengthy inspection, 
giving rise to an increase in the false hit rate by the operator. 
Finally, it is frequently needed to find inter-reader variability in 
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the manual inspection and monitoring by experts. Therefore, 
an automated system that can assist professional experts to 
analyze long-term biomedical signals is very valuable in real- 
word applications. 

Automatic analysis of biomedical time series such as EEG 
and ECG signals based on machine learning techniques has 
been applied to a variety of real- word applications. For in- 
stance, EEG signals are automatically analyzed for epileptic 
seizure detection |jT| ||2j |[3j, brain computer interaction ||4| |[5| 
1 6 1, human mental fatigue detection |7| and emotion recog- 
nition |8|. ECG signals that provide useful information about 
heart rhythm are used to study heart arrhythmias Q |TQ| . It is 
essential to extract meaningful features to represent individual 
time series in the aforementioned applications. Some methods 
|TT| |T2} directly describe time series in time domain while 
some others extract features from transformed domain ||2| |[T3| 
|10|. For instance, Zadeh et al. |12| extracted morphological 
and timing-interval features from ECG segments to classify 
heartbeats. Guo et al. f2\ extracted line length features based 
on Discrete Wavelet Transform (DWT) to detect epileptic EEG 
segments. 

Most of the previous representations extract local temporal 
or frequency information to characterize time series, which are 
very effective for short time series or time series with periodic 
waveforms. However, they may have limited ability to capture 
structural similarity of long time series which have repetitive 
but un-periodic waveforms, for instance, Electrocardiography 
(ECG) and Electroencephalography (EEG) signals. In order 
to capture the high-level structural information of time series, 
Lin 1 14] proposed a bag-of-patterns (BoP) representation by 
converting a time series to a words string using the Symbolic 
Aggregate approximation (SAX). The temporal order of local 
segments, i.e., local patterns, in a time series is ignored and 
all the local segments in the time series are histogrammed to 
construct a bag-of-patterns representation. The bag-of-patterns 
representation is effective to capture the structural similarity 
of time series. However, one drawback of the bag-of-patterns 
representation is that its dimension may be very high, which 
limits its application for large datasets. For instance, when 
the size of the alphabet r and the number of symbols w are 
4 and 8, respectively, the dimension of the bag-of-patterns 
representation could reach = 65536. 

In this work, motivated by the success of the bag-of- 
words model in text document analysis |[T5| |T6| and image 
analysis |T7| |T8| , we propose a simple yet effective bag-of- 
words representation whose dimension is much lower than the 
bag-of-patterns representation to characterize biomedical time 
series. The bag-of-words representation is able to capture high- 
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Fig. 1. The flowchart of the proposed bag-of-words approach for analysis of biomedical time series. 
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level structural information of time series due to the utilization 
of both local and global information. Moreover, it can be 
used to represent streaming data and time series with different 
lengths because it is incrementally constructed. 

The bag-of-words model was originally developed for doc- 
ument representation. The basic idea is to define a codebook 
that contains a set of codewords and then represent a document 
as a histogram of the codewords, each entry of which is the 
count of a codeword occurred in the document. Although the 
order information of words is ignored, the bag-of-words model 
is still very effective to capture document information because 
the frequency information of codewords in documents are well 
explored. Recently, the bag-of-words model is extended to 
analyze images and videos in computer vision |T7| |T8| . Local 
patches extracted from images or videos are treated as words 
and the codebook is constructed by clustering all the local 
patches in the training data. Similar to the extension of the bag- 
of-words representation in computer vision, we here extend the 
bag-of-words representation to characterize biomedical time 
series by regarding local segments extracted from time series 
as words and treat the time series as documents. 



A. Overview of the Proposed Approach 

In the bag-of-words representation, a time series is treated 
as a text document and local segments extracted from the 
time series as words. The general flowchart of the proposed 
method is demonstrated in Fig. [T] Firstly, we continuously 
slide a window with a pre-defined length along the time series 
to extract a group of local segments. Then, we extract a 
feature vector from each of the local segments using DWT. 
Next, similar to the bag-of- visual-words model in images and 
videos analysis fT7| |T8| , all local segments from the training 
time series are clustered by k-means clustering to create a 
codebook, i.e, the cluster centers are treated as codewords. 
Then, a local segment is assigned the codeword that has the 
minimum distance to the local segment, and the time series 
is represented as a histogram of the codewords. Finally, the 
bag-of-words representation is used as input for classification. 



B. Contribution and Organization 

The main contribution of the paper is twofold: (i) a simple 
yet effective bag-of-words representation is proposed for anal- 
ysis of biomedical time series; (ii) a series of experiments was 
conducted to investigate the effectiveness and robustness of the 
bag-of-words representation for classification of biomedical 
time series. 

The structure of the paper is organized as follows. In 
Section |ll| the proposed method including the bag-of-words 
representation, distance measures and classification method 
is described. Section |llll describes the biomedical time series 
datasets used in the experiments. Experimental results are re- 



ported and analyzed in Section IV Discussion and conclusion 
are given in Section [V| and Section [Vl| respectively. 



II. Proposed Method 

In this section, we describe the bag-of-words representation 
for biomedical time series classification. The bag-of-words 
representation ignores the temporal order of local segments 
within a time series and represents the time series as a 
histogram of codewords i.e., local segments. Several distance 
measures are then introduced for the histograms comparison. 



A. Bag-of-words Representation 

The procedure of generating the bag-of-words representa- 
tion is illustrated in Fig. |2] We continuously slide a window 
with pre-defined length along a time series and extract a 
group of local segments from the time series. A feature 
vector is then extracted from each of the local segments using 
the DWT to characterize the local segment. All the local 
segments from the training data are clustered to construct a 
codebook that contains a set of codewords, i.e., the cluster 
centers. Then, a local segment is assigned the codeword that 
has minimum distance with the local segment. The bag-of- 
words representation ignores the order of local segments in 
a time series and represent the time series as a histogram of 
codewords. 
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Fig. 2. The procedure of generating the bag-of-words representation. The codebook is constructed by clustering all local segments from training data. The 
"circle", "triangular" "square" and "hexagon" stand for the basis elements, i.e., codewords, in the codebook. Each local segment is assigned a codeword and 
a histogram representation is extracted for each time series by histogram the codewords in the time series. The figure is best viewed in color. 



1 ) Local Segments Extraction: A group of local segments 
are extracted from each time series by continuously sliding 
a window with pre-defined length along the time series. As 
local segments from different time series may be at different 
scales, all the local segments are normalized to zero mean and 
standard deviation. We transform a local segments into wavelet 
domain and use approximations wavelet coefficients of DWT 
as a feature vector to represent the local segment. 

The wavelet transform that analyzes a signal at different 
frequency bands provides both accurate frequency information 
at low frequencies and time information at high frequencies, 
which are very important for biomedical signal analysis. The 
choice of wavelet function and the number of decomposition 
levels is of importance for the multiresolution decomposition. 
In this work, a single level DWT with order 3 Daubechies 
wavelet function (db3) is employed to decompose a local 
segment into approximations coefficients and detailed coeffi- 
cients. Similar to the work in |T3| , we used the approximation 
coefficients as a feature vector to represent the local segment. 
We do not directly use the raw value of local segments 
as feature vectors due to the fact that features using the 
approximations coefficients not only are more robust to noise 
than features using raw segments but also have nearly half 
dimension of the local segments. 

2) Codebook Formulation: In the text document analysis, 
a codebook (vocabulary) is a set of pre-defined words, which 
are also called codewords. The bag-of-words method counts 
the number of each codeword that exists in a document and 
provides a document-level representation using a histogram 
of codewords. In image and video analysis, such codebook is 



generally created by performing clustering on a group of local 
patches from training data, i.e., the codewords are defined 
as the clustering centers. The codeword that is nearest to a 
local patch is then assigned to the local patch. The spatial and 
temporal order information of local patches (codewords) is 
ignored and an image or video is represented as a histogram 
of codewords in the image or video. The classical k-means 
clustering algorithm 1 17| [18J is commonly used to construct 
the codebook, although some other unsupervised and super- 
vised methods are also developed such as mean-sift |19 | and 
supervised Gaussian mixture models |20|. 

Similar to the codebook construction in image and video 
analysis, we cluster all the local segments from training time 
series using k-means clustering to construct the codebook. 
The clustering centers estimated by the k-means clustering are 
regarded as basis elements of the codebook, i.e., codewords. 
Suppose a group of local segments X = [xi,X2,-- - ,Xn], 
where G M^, are extracted from taring time series, the 
codebook construction by k-means clustering is formulized as 
the optimization problem: 

n 

min y^llx^ — Bvjk, 

s.t. card{vi) = 1, |v^| = l,Vz, > 0, 

where B G M^^^ is the clustering centers and the vector Vi is 
the clustering index of the local segment Xi, which is a unit- 
basis vector that has only one component equal to one and 
all the other components are zero. The codebook B G R^^^ 
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Fig. 3. The bag-of-words representation of an example time series. See the corresponding text for more details. 



has K codewords, each of which is a (i-length vector, the 
same length as the local segments. It is worth noting that the 
codebook only needs to be learned once from training data 
and it is universal for both training and test data. 

The codebook size K is of importance to the bag-of-words 
representation. A compact codebook with too few entries has 
a limited discriminative ability, while a large codebook is 
likely to introduce noise due to the sparsity of the codewords 
histogram. Therefore, the size of the codebook should well 
balance the trade-off between discrimination and noise. 

3) Codewords Assignment: Once the codebook is con- 
structed, a local segment is assigned the codeword that has 
minimum distance with the local segment. Specifically, sup- 
pose that a codebook with K entries, B = {bi, b2, b^}, is 
learned from the training data. A local segment is assigned 
the c th codeword that: c* = argmin^ (i(bj, x^), where <i(-, •) 
is the Euclidean distance function. 

After each local segment is assigned a codeword, the 
temporal order of local segments is ignored and a time 
series is represented as a histogram of codewords in the time 
series, each entry of which specifies the count of a codeword 
occurred in the time series. Fig. |3] illustrates the bag-of-words 
representation of an example EEG time series. The figure 
in the first row is the example EEG time series. The three 
figures in the second to fourth rows (left) are three local 
segments with length of 160 extracted from the time series, 
and the three figures in the second to fourth rows (right) are 
the corresponding codewords assigned to the local segments 
from codebook, which consists of 1000 codewords. The three 
local segments are assigned the 4:32th, 118th, and 628th 
codewords, respectively. The figure in the last row is the bag- 
of-words representation for the time series, each entry of which 
gives the count of a codeword occurred in the time series. 



B. Classifier 

Some discriminative classifiers such as Artificial Neural 
Networks (ANN) |2|, Support Vector Machine (SVM) |^, 
and Probabilistic Neural Networks (PNN) |13 | are widely 
used for biomedical signal classification. Since our goal in 
this paper is to investigate the effectiveness of the bag-of- 
words representation, here we use the simplest classifier, i.e., 
the 1 -Nearest Neighbor (1-NN) classifier. Let t be a test 
time series and represents the time series from the ith 
category. The test data is determined as the class C of the 
training sample that has minimal distance with the test data, 
i.e., C* = argmin^ I)(t, R*), where D{-^-) is the similarity 
measure that is defined in the following. 

C. Similarity Measure 

Many similarity measures have been proposed for his- 
tograms comparison. In the following, we describe four com- 
monly used similarity measures for distance measurement of 
two bag-of-words representations. 

1) Euclidean Distance: The Euclidean distance between 
histogram h and histogram k is defined as: 

Di,(h,k)= , (2) 

where Di^2(h, k) is the Euclidean distance, which is com- 
monly used in pattern recognition. 

2 ) Chi-Squared Distance: The Euclidean distance subtracts 
the two histograms bin-by-bin and contributes each bin pairs 
equally to the distance. The problem is that some words such 
as "the", "but" and "however" occur more frequently in doc- 
uments; therefore, they contribute more to the distance in the 
Euclidean Distance measure. But they may actually have less 
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discriminative information than rarely happened codewords. 
This leads to the Chi-Squared distance (x^ distance): 

DA^,^)=y,M^M-, (3) 



TABLE I 

The three datasets used in the experiments. 



h{i) + k{i) 



where £ is a small value to avoid dividing by zero. The 
distance introduces a normalization to emphasis the rarely hap- 
pened codewords because common words are always shared 
between documents from different categories. 

3) Jens en- Shannon Distance: Each entry of the bag-of- 
words represents can be interpreted as the frequency of a 
codeword occurred in a time series. Therefore, the histogram 
stands for a probabilistic distribution over discrete random 
variables. A simple measure to compare two distribution is 
the Kullback-Leibler divergence: 



DKL{h\\k) = ^h{i) {log2h{i) - log2k{i)) . 



(4) 



If and only if h and k are the same, the KL divergence 
becomes zero. In order to keep the distance symmetric, the 
Jensen-Shannon distance | [22| is introduced as a symmetric 
extension: 



Djs = \{Dkl») 



DifL(k||h)). 



(5) 



4) Histogram Intersection based Distance: The histogram 
intersection which counts the total overlap between two his- 
tograms is able to address the problem of partial matches when 
the two histograms have different sum over all the bins. The 
distance based on the histogram intersection is defined as p3| : 

Dhi{M\\^) = 1 - ^max(/i(i),fc(0) , 



(6) 



where h and k are normalized histogram vectors. Two his- 
tograms that have larger overlap will obtain a smaller distance. 

D. Practical Implementation 

A large number of local segments may be extracted from 
training data, especially for large datasets. Clustering a large 
number of local segments to construct the codebook is compu- 
tationally expensive. In practice, instead of using all the local 
segments extracted from the training data, we performing the 
k-means clustering on a subset of local segments randomly 
selected from the training data to construct the codebook. 
This strategy is also employed in image and video analysis to 
reduce the computation of codebook construction fT7| , (TSj. 

We continuously slide a window along a time series to 
extract local segments. However, when the time series contains 
too many data points, a large number of local segments will 
be extracted from the time series, which requires expensive 
computation. For instance, for a time series consisting of 
2000 data points, about 1900 local segments will be extracted 
using a window with 100 length. In practice, we can slide 
the window with a step of n data points (n = 2, 4, 6 or 8) 
along the time series to reduce the number of local segments 
extracted from the time series. 

The MATLAB code of the bag-of-words representation 
in this work was made publicly available at 
|http://www.mathworks.com/matlabcentral/fileexchange/38050| 
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III. Experimental Datasets 

In this study, three datasets constructed from EEG and ECG 
signals are used to evaluate the performance of the bag-of- 
words representation. The first dataset is collected from EEG 
signals and it is widely used for automatic epileptic seizure 
detection. The other two datasets are extracted from long 
ECG signals (more than 1000000 points) that collected from 
different subjects with random start points. Each of the long 
ECG signals corresponds to a class, i.e., subjects' identity. In 
order to demonstrate that the bag-of-words representation can 
be for time series with different lengths, the third dataset is 
extracted with different lengths between 2048^4096, while 
the first two datasets have the same length of 4096 and 2048, 
respectively. 

It is worth noting that although the extracted ECG time 
series in the same class are obtained from the same long 
ECG signal, there exist substantial inter-class variations. The 
aim of the ECG signal classification in our experiment is 
to attribute each instance, i.e., extracted ECG time series, 
to their subjects' identity, which can be used for human 
identification from ECG signals in real application p4| , pSj . 
This task may be not difficult by comparing features extracted 
based on heartbeat waveforms or fiducial points, for instance, 
cross-correlation among QRS complexes. However, the bag- 
of-words representation does not need to detect or localize 
any heartbeat waveforms or fiducial points, which is always 
required in previous works p4|, (251. 



A. EEG Dataset 

The EEG dataset described in |26 | was used in our ex- 
periments. The complete EEG dataset consists of five classes 
(i.e.. A, B, C, D, and E), each of which contains 100 single- 
channel EEG sequences of the same length 4096. All the 
signals were recorded with the same 128-channel amplifier 
system and visual inspected for artifacts. Set A and set B are 
collected from surface EEG recordings of five healthy subjects 
with eye open and eye closed, respectively. The other three sets 
(C, D and E) are taken from intracranial EEG recording of five 
patients suffered from epileptic. Set C and set D are taken 
from the epileptogenic zone and the hippocampal formation 
of the opposite hemisphere of the brain, respectively. Set C 
and set D were recorded in seizure-free intervals, whereas set 
E only contains seizure activity. Fig. |4] shows example time 
series from each of the five classes. 

B. ECG-40 Dataset 

The ECG-40 dataset was obtained from the Fantasia ECG 
database pTl. The database consists of twenty youth and 
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twenty old healthy subjects. Forty long ECG signals are 
collected from each of the forty subjects monitored for about 
two hours with a sampling rate of 250 Hz. All the signals 
contain more than 1000000 data points, which are very long. 
We extracted fifty time series of length 2048 from each of the 
forty long signals with random start points. Totally, the ECG- 
40 dataset contains 2000 time series of length 2048, which are 
evenly distributed in the forty classes. 

C. ECG- 15 Dataset 

The ECG- 15 dataset consists of 1500 time series extracted 
from fifteen long ECG signals in the BIDMC Congestive Heart 
Failure Database |27|. The fifteen long ECG signals were 
recorded from fifteen patients suffered from severe congestive 
heart failure. One hundred time series of length between 2048 
and 4096 are extracted from each of the fifteen long ECG 
signals with random start points. Totally, the ECG- 15 dataset 
consists of fifteen classes, each of which has 100 time series 
of length between 2048 and 4096. 

Table |l| summaries the three datasets used in the experi- 
ments. It should be noted that the lengths of the 1500 time 
series in the ECG- 15 dataset are not the same, which vary 
between 2048 and 4096. 

IV. Results 

In this section, we report experimental results on the three 
datasets. Firstly, we investigated the impact of parameters by 
varying the length of local segments and the size of codebook 
K based on different distance measures. Then, we compared 
the proposed method with the Discrete Wavelet Transform 
(DWT) |[T3| representation, the Discrete Fourier Transform 
(DFT) f2S] representation, the NN classifier based on Dynamic 
Time Warping (DTW) |29| distance and the bag-of-patterns 
representation (BoP) |T4]|71n addition, we compared the clas- 
sification accuracies achieved by the proposed method with 
those achieved by other state-of-the-art methods on the EEG 
dataset. Finally, we investigated the robustness of the bag-of- 
words representation to noise. In order to ensure an un-biased 
evaluation, a dataset is randomly partitioned into 10 subsets. 
Nine subsets are used for training while the remaining one 



is retained for test. The classification process is then repeated 
10 times with each of the 10 subsets used exactly once as test 
data. 



A. Length of Local Segments 

We varied the length of local segments between 8 and 256 in 
the experiments. The determination of such parameter ranges 
relies on the fact that the biomedical time series such as 
ECG and EEG signals are relatively flat. The classification 
accuracies on the EEG, ECG-40 and ECG- 15 datasets with a 
codebook size of 1000 using the Chi-Squared distance is illus- 



trated in Fig. |5(a)[ Fig. |5(b)| and Fig. |5(c)| respectively. From 
the experimental results, it can be seen that the performance 
is relatively stable with respect to the length of local segments 
when it is between 64 and 192. The classification accuracies 
decrease considerably with the length less than 16. This is 
mainly due to the fact that a local segment with too short or too 
long length can not capture local structure information within 
time series. In the following experiments, we empirically set 
the length of local segments as 128. 

B. Codebook Size 

To show the performance of the bag-of- words representation 
with respect to the size of the codebook, we report the classi- 
fication accuracies on the three datasets in Fig. |6] increasing 
the size of the codebook from 10 to 3500. We can see that the 
results become very stable when the size of the codebook is 
larger than 500. The classification accuracies reduce quickly 
if the size of the codebook is less than 100, which confirms 
that a compact codebook with too few entries has a limited 
discriminative ability. The optimal size of the codebook can 
be roughly identified as 1000^3500. 

C. Distance Measurement 

We compared the classification performance on the three 
datasets using the four similarity measures described in Sec- 
tion |II-C| Fig. [7] demonstrates the classification accuracies 
based on the four distance measures with the codebook size of 
10, 100, 1000 and 2000. We can see that the results are sHghtly 
different using various distance measures, indicating that the 
distance measures have limited impact on the performance of 
the bag-of-words representation. Overall, the Chi-Squared dis- 
tance measure performs slightly better than the other measures 
for all the four sizes of the codebook. 



D. Comparison with Other Methods 

We compared the performance of the proposed bag-of- 
words representation with that of the DWT representation |T3| , 
the DFT representation |28| , and the NN classifier based on 
the DTW distance |29|. In addition, we also compared the 
proposed bag-of-words representation with the bag-of-patterns 
representation |14| , which is very similar to the proposed 
approach. 

• DWT that represents a signal in multiresolution is able to 
capture both frequency and location information of time 
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Fig. 5. Classification accuracies with respect to the length of segments on the EEG (a), ECG-40 (b) and ECG-15 (c) datasets, respectively. 




Fig. 6. Classification accuracies with respect to the codebook size on the EEG (a), ECG-40 (b) and ECG-15 (c) datasets, respectively. 
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Fig. 7. Classification accuracies using different distance measures on the EEG (a), ECG-40 (b) and ECG-15 (c) datasets, respectively. The figure is best 
viewed in color. 



TABLE II 

COMPARISON OF RESULTS ON THE THREE DATASETS USING DIFFERENT 
METHODS. 



methods EEG ECG-40 ECG-15 

DWT 76.0 25.1 20.1 

DFT 91.6 85.6 60.6 

DTW 71.6 74.5 85.5 

BoP (14I 87.8 99.4 99.8 



Proposed method 93.8 99.5 100 



series. Similar to the DWT based feature used in (13], we 
used the Daubechies wavelet (db2) and decomposed the 
time series into 4 levels. The detail wavelet coefficients of 
the four levels and the approximation wavelet coefficients 
of the fourth level are concatenated to form the final 
representation. 

• DFT is a widely used transformation technique to extract 
frequency information from time series. We transformed 
the original time series into the frequency domain and 
extracted the DFT coefficients as features. 

• DTW that uses dynamic programming technique to de- 
termine the best alignment of two sequences is able to 
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TABLE III 

Comparison of the classification accuracy (%) on the epileptic EEG dataset. 



Researchers 


Datasets 


Num of Class 


Methods 


Accuracy (%) 


Kannathal et aL | L| 


A, E 


2 


Entropy+adaptive neuro-fuzzy inference system 


95 


Polat and Giines |30j 


A, E 


2 


FFT+decision tree 


98.72 


Yuan et al. 131 


D, E 


2 


Nonlinear features+extreme learning machine 


96.00 


Ocak jsi] 


(A, C, D), E 


2 


DWT+approximate entropy 


96.65 


dun pt al 191 


A E 


2 


T ITif^ If^TiCyfri Tf^CtfMVf^ nQCf^H CMTX Tl XA/ T^J-fl'f*fl Tl r*l Q 1 nf^llfQl Tlf^tAll/rWk^C 
J— /lllC ICllgLll iCclLUlC UdoCU- yJLL LJ yy ± T^di LlllCldl llCUldl llCLWUlJVo 


99.6 




ta r F 


2 


T iTip IpTKTtli fpjitiirp hji^ipH on T^W^T-i-jirtifiPiJil tipiii^nl nptworlc^ 

J— /lilt/ It/ll^tll It/dULllt/ L/dot'U- yjLL J_y VV X 1 dl LlllV./ldl llt/Llldl llt/L WVJl JVo 


97.75 




(A, B, C, D), E 


2 


Line length feature based on DWT+artificial neural networks 


97.77 


ouier anQ u Deyii | z 1 1 


/\, D, L., Lf, III 




Raw Data+support vector machine 


/ J.O 








ivdW l.-'dLaT-piOUaUlllSLlC llcUldl lieLWOlK 


79 on 








JxdW UdLd+iVli^r i>i> 


Oo.oU 








DWT and lyapunov exponents+support vector machine 


99.28 








DWT and lyapunov exponents+probabilistic neural network 


98.05 








DWT and lyapunov exponents+MLPNN 


93.63 


This work 


A, E 


2 


Bag-of-words+ 1-NN 


99.5 




(A, C, D), E 


2 


Bag-of-words+ 1-NN 


99.0 




(A, B, C, D), E 


2 


Bag-of-words+ 1-NN 


99.2 




A, B, C, D, E 


5 


Bag-of-words+ 1-NN 


93.8 



deal with temporal drift between time series. The distance 
matrix of each pair of the test time series and the training 
time series is calculated based on the unconstrained DTW. 
This distance matrix is used as input of the NN classifier. 
• The BoP representation that represents a time series as a 
histogram of local patterns is very similar to the proposed 
bag-of- words representation. The size of alphabet r and 
the number of symbols w are empirically set to 4 and 6, 
respectively. We varied the length of local segments in 
the bag-of-patterns representation from 16 to 320 with a 
step of 16. The best accuracy is reported for comparison. 

Since the time series in the ECG-15 datasets have different 
lengths (2048^4096), we resized all the time series to the 
same length of 4096 using bilinear interpolation so that the 
DWT and DFT based features have the same dimension. When 
calculating the DTW distance, we reduced all the time series 
in the three datasets to the length of 820 with a downsampling 
rate of about 5 because DTW is computationally expensive. 

Table |ll| summarizes the best results achieved by the pro- 
posed approach and the other methods. It can be seen that 
the proposed approach achieves the highest accuracies (93.8% 
on the EEG dataset, 99.5% on the ECG-40 dataset, and 
100% on the ECG-15 dataset, respectively), which illustrate 
the effectiveness of the bag-of- words representation. The BoP 
representation obtains comparable accuracies on the ECG-40 
and the ECG-15 datasets with that by the bag-of- words repre- 
sentation. However, the proposed bag-of-words representation 
performs significantly better than the BoP representation on 
the EEG dataset. The DFT feature and DTW distance methods 
outperform the DWT based method. This is probably because 
that the DFT and DTW can better deal with temporal sift 
between sequences than the DWT. 



The EEG dataset used in our experiment is a popular dataset 
for automatic epileptic seizure classification and localization. 
Table [in| provides a comparison of the classification accuracies 
between the proposed bag-of-words method and previous 
state-of-the-art approaches in the literature. It should be no- 
ticed that the comparison is not direct, since the aim of our 
method is to classify the time series at sequence level, while 
the other methods are to classify segments extracted from the 
time series. Some works used only several subsets of the whole 
EEG dataset to construct a 2-class dataset, while others used 
the whole EEG dataset with 5 classes. For the 2-class clas- 
sification, the bag-of-words method outperforms most of the 
other methods. For the 5-class classification where the whole 
EEG dataset is used, the classification accuracies of support 
vector machine (SVM), probabilistic neural network (PNN) 
and multilayer perception neural network (MLPNN) with 
raw data are 75.60% 72.00% and 68.80% |21|, respectively. 
When features extracted from DWT and lyapunov exponents 
are used, the corresponding accuracies increase to 99.28%, 
98.05% and 93.63% |21|, respectively. The result obtained 
by the proposed BoW representation with the simplest NN 
classifier is slightly lower than those achieved by SVM and 
PNN with features based on DWT and lyapunov exponents. 
However, it is slightly higher than the result obtained by 
MLPNN (93.63%) with features based on DWT and lyapunov 
exponents. 

E. Robustness to Noise 

This experiment is designed to investigate the robustness 
of the bag-of-words representation to noise. All signals in 
the EEG, ECG-40 and ECG-15 datasets were corrupted by 
zero mean white Gaussian noise. The standard deviation of 
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TABLE IV 

Classification accuracies (%) on the three datasets corrupted 
by zero-mean white gaussian noise. 



SNR 


EEG 


ECG-40 


ECG-15 


lOdb 


92.6 


98.9 


99.8 


8db 


91.8 


98.4 


99.7 


6db 


91.2 


97.6 


99.6 


4db 


90.4 


95.5 


99.2 


2db 


88.8 


92.6 


98.9 


Odb 


85.2 


89.9 


98.6 



the white Gaussian noise is varied so that the SNRs are 
between lOdB and OdB. The training data and the test data 
are separated exactly the same as those in the previous 
experiments. Table [rv| summaries the classification accuracies 
on the three datasets contaminated by the white Gaussian 
noise with different SNRs. It can be seen that the bag-of- 
words approach is relatively robust to noise. The accuracies 
decreased by less than 2 percents when the SNR is lOdB. 
Even for considerable noise contamination with the SNR OdB, 
the accuracies reduced less than 10 percents for the EEG and 
ECG-40 datasets, and only less than 2 percents for the ECG-15 
dataset. 

V. Discussion 

Although the bag-of-words representation ignores the tem- 
poral order of local segments, it is able to effectively capture 
high-level structural information due to the fact that the 
frequency of the codewords (local segments) occurred in a 
time series is well utilized. However, since the local segments 
are extracted by sliding a window along time series, a time 
series that is not reasonably long cannot provide enough 
local segments to capture local structures in the time series. 
Therefore, the bag-of-words representation may be ineffective 
to represent short time series, which is mainly due to the 
limitation that the bag-of-words representation cannot extract 
enough meaningful and discriminative local segments from 
short sequences. 

The size of the codebook N is pre-defined and empirically 
determined in the method. A compact codebook with small 
size has a limited discriminative ability, while a codebook 
with large size is likely to introduce noise. How to adaptively 
set the optimal size of the codebook to make the codebook 
compact and yet discriminative is still an open question. Some 
criteria can be defined to merge entries of a codebook to 
construct an adaptive codebook. For instance, the method 
in |32| utilized Maximization of Mutual Information (MMI) 
principal to estimate the optimal TV. Two entries of a codebook 
are merged by maximizing the mutual information in an 
unsupervised way. Creating a codebook with adaptive size will 
be investigated in our future work. 

VI. Conclusion 

In this paper, we proposed a bag-of-words representation for 
biomedical time series analysis. The proposed method treats 
a time series as a document and local segments extracted 



from the time series as words. The time series is represented 
as a histogram of codewords. Although the temporal order 
information of the local segments is ignored, both local 
structure and global structure information of the time series 
are captured. Experimental results on three publicly available 
datasets demonstrate that the bag-of-words representation is 
effective for characterizing biomedical time series such as 
EEG and ECG signals. Furthermore, the bag-of-words repre- 
sentation is not only insensitive to the length of local segments 
and the size of codebook, but also robust to noise. The distance 
measures for comparison of histograms are also investigated 
in the experiments, showing that the Chi-Squared distance 
measure is more suitable for comparing histograms than the 
other distance measures. We compared the performance of 
the bag-of-words representation with several state-of-the-art 
approaches in the literature. Experimental results show that 
the bag-of-words representation with the simplest 1 -Nearest 
Neighbor (1-NN) classifier achieves comparable or higher 
classification accuracies than those by the others. 
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